67 research outputs found

    Estimation of the Potential Detection of Diatom Assemblages Based on Ocean Color Radiance Anomalies in the North Sea

    Get PDF
    Over the past years, a large number of new approaches in the domain of ocean-color have been developed, leading to a variety of innovative descriptors for phytoplankton communities. One of these methods, named PHYSAT, currently allows for the qualitative detection of five main phytoplankton groups from ocean-color measurements. Even though PHYSAT products are widely used in various applications and projects, the approach is limited by the fact it identifies only dominant phytoplankton groups. This current limitation is due to the use of biomarker pigment ratios for establishing empirical relationships between in-situ information and specific ocean-color radiance anomalies in open ocean waters. However, theoretical explanations of PHYSAT suggests that it could be possible to detect more than dominance cases but move more toward phytoplanktonic assemblage detection. Thus, to evaluate the potential of PHYSAT for the detection of phytoplankton assemblages, we took advantage of the Continuous Plankton Recorder (CPR) survey, collected in both the English Channel and the North Sea. The available CPR dataset contains information on diatom abundance in two large areas of the North Sea for the period 1998-2010. Using this unique dataset, recurrent diatom assemblages were retrieved based on classification of CPR samples. Six diatom assemblages were identified in-situ, each having indicators taxa or species. Once this first step was completed, the in-situ analysis was used to empirically associate the diatom assemblages with specific PHYSAT spectral anomalies. This step was facilitated by the use of previous classifications of regional radiance anomalies in terms of shape and amplitude, coupled with phenological tools. Through a matchup exercise, three CPR assemblages were associated with specific radiance anomalies. The maps of detection of these specific radiances anomalies are in close agreement with current in-situ ecological knowledge

    Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

    Get PDF
    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    Consistency between Satellite Ocean Colour Products under High Coloured Dissolved Organic Matter Absorption in the Baltic Sea

    Get PDF
    Ocean colour (OC) remote sensing is an important tool for monitoring phytoplankton in the global ocean. In optically complex waters such as the Baltic Sea, relatively efficient light absorption by substances other than phytoplankton increases product uncertainty. Sentinel-3 OLCI-A, Suomi-NPP VIIRS and MODIS-Aqua OC radiometric products were assessed using Baltic Sea in situ remote sensing reflectance

    Performance of Ocean Colour Chlorophyll a algorithms for Sentinel-3 OLCI, MODIS-Aqua and Suomi-VIIRS in open-ocean waters of the Atlantic

    Get PDF
    This is the final version. Available on open access from Elsevier via the DOI in this recordThe proxy for phytoplankton biomass, Chlorophyll a (Chl a), is an important variable to assess the health and state of the oceans which are under increasing anthropogenic pressures. Prior to the operational use of satellite ocean-colour Chl a to monitor the oceans, rigorous assessments of algorithm performance are necessary to select the most suitable products. Due to their inaccessibility, the oligotrophic open-ocean gyres are under-sampled and therefore under-represented in global in situ data sets. The Atlantic Meridional Transect (AMT) campaigns fill the sampling gap in Atlantic oligotrophic waters. In-water underway spectrophotometric data were collected on three AMT field campaigns in 2016, 2017 and 2018 to assess the performance of Sentinel-3A (S3-A) and Sentinel-3B (S3-B) Ocean and Land Colour Instrument (OLCI) products. Three Chl a algorithms for OLCI were compared: Processing baseline (pb) 2, which uses the ocean colour 4 band ratio algorithm (OC4Me); pb 3 (OL_L2M.003.00) which uses OC4Me and a colour index (CI); and POLYMER v4.8 which models atmosphere and water reflectance and retrieves Chl a as a part of its spectral matching inversion. The POLYMER Chl a for S-3A OLCI performed best. The S-3A OLCI pb 2 tended to under-estimate Chl a especially at low concentrations, while the updated OL_L2M.003.00 provided significant improvements at low concentrations. OLCI data were also compared to MODIS-Aqua (R2018 processing) and Suomi-NPP VIIRS standard products. MODIS-Aqua exhibited good performance similar to OLCI POLYMER whereas Suomi-NPP VIIRS exhibited a slight under-estimate at higher Chl a values. The reasons for the differences were that S-3A OLCI pb 2 Rrs were over-estimated at blue bands which caused the under-estimate in Chl a. There were also some artefacts in the Rrs spectral shape of VIIRS which caused Chl a to be under-estimated at values >0.1 mg m-3. In addition, using in situ Rrs to compute Chl a with OC4Me we found a bias of 25% for these waters, related to the implementation of the OC4ME algorithm for S-3A OLCI. By comparison, the updated OLCI processor OL_L2M.003.00 significantly improved the Chl a retrievals at lower concentrations corresponding to the AMT measurements. S-3A and S-3B OLCI Chl a products were also compared during the Sentinel-3 mission tandem phase (the period when S-3A and S-3B were flying 30 sec apart along the same orbit). Both S-3A and S-3B OLCI pb 2 under-estimated Chl a especially at low values and the trend was greater for S-3A compared to S-3B. The performance of OLCI was improved by using either OL_L2M.003.00 or POLYMER Chl a. Analysis of coincident satellite images for S-3A OLCI, MODIS-Aqua and VIIRS as composites and over large areas illustrated that OLCI POLYMER gave the highest Chl a concentrations and percentage (%) coverage over the north and south Atlantic gyres, and OLCI pb 2 produced the lowest Chl a and % coverage.European Space Agency (ESA)Natural Environment Research Council (NERC)National Centre for Earth Observation (NCEO

    Extending CATH: increasing coverage of the protein structure universe and linking structure with function

    Get PDF
    CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework

    Composite structural motifs of binding sites for delineating biological functions of proteins

    Get PDF
    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure

    A new approach to assess and predict the functional roles of proteins across all known structures

    Get PDF
    The three dimensional atomic structures of proteins provide information regarding their function; and codified relationships between structure and function enable the assessment of function from structure. In the current study, a new data mining tool was implemented that checks current gene ontology (GO) annotations and predicts new ones across all the protein structures available in the Protein Data Bank (PDB). The tool overcomes some of the challenges of utilizing large amounts of protein annotation and measurement information to form correspondences between protein structure and function. Protein attributes were extracted from the Structural Biology Knowledgebase and open source biological databases. Based on the presence or absence of a given set of attributes, a given protein’s functional annotations were inferred. The results show that attributes derived from the three dimensional structures of proteins enhanced predictions over that using attributes only derived from primary amino acid sequence. Some predictions reflected known but not completely documented GO annotations. For example, predictions for the GO term for copper ion binding reflected used information a copper ion was known to interact with the protein based on information in a ligand interaction database. Other predictions were novel and require further experimental validation. These include predictions for proteins labeled as unknown function in the PDB. Two examples are a role in the regulation of transcription for the protein AF1396 from Archaeoglobus fulgidus and a role in RNA metabolism for the protein psuG from Thermotoga maritima

    Combinatorial Clustering of Residue Position Subsets Predicts Inhibitor Affinity across the Human Kinome

    Get PDF
    The protein kinases are a large family of enzymes that play fundamental roles in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residue positions have been shown to be informative of inhibitor selectivity. The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method, introduced here, provides a semi-supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. Here, CCORPS is applied to the problem of identifying structural features of the kinase ATP binding site that are informative of inhibitor binding. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding affinity profile of 8 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors
    corecore